April 30, 2018

Overview of Work/Research

  • Neuroimaging and R (Neuroconductor Project)
  • R Package Development/“Data Science”
  • Segmentation/Classification of:
    • White Matter Lesions in Multiple Sclerosis
    • Brain vs. Skull (CT)
    • Brain Hemorrhage/Stroke (CT)

Overview of Work/Research

  • Neuroimaging and R (Neuroconductor Project)
  • R Package Development
  • Segmentation/Classification of:
    • White Matter Lesions in Multiple Sclerosis
    • Brain vs. Skull (CT)
    • Brain Hemorrhage/Stroke (CT)

Brain Image Processing in R

Workflow for an Analysis

  • bash flow
  • FSL flow
  • ANTs flow
  • MRIcroGL flow
  • OsiriX flow
  • SPM 12 flow
flow

Workflow for an Analysis

Multiple pieces of software used

  • all different syntax
flow

Goal:

Lower the bar to entry

  • all R code
    • pipeline tool
    • “native” R code

Complete pipeline

  • preprocessing and analysis
flow

What did medical imaging in R have?

flow

Bioinformatics Repository: Bioconductor
flow

  • centralized bioinformatics/genomics packages
  • large community/number of packages (> 1300)
  • published tutorials and workflows
  • additional requirements to CRAN (e.g. packages need vignettes)

Bioinformatics Repository: Bioconductor
flow

flow
An R Platform for
Medical Imaging Analysis
(Muschelli et al. 2018)

https://neuroconductor.org/ flow

Tutorials and Workflows flow

Authored R Packages:

  • fslr

    (Muschelli, John, et al. “fslr: Connecting the FSL Software with R.” R JOURNAL 7.1 (2015): 163-175.)

  • brainR

    (Muschelli, John, Elizabeth Sweeney, and Ciprian Crainiceanu. “brainR: Interactive 3 and 4D Images of High Resolution Neuroimage Data.” R JOURNAL 6.1 (2014): 42-48.)

  • ichseg

    Muschelli, John, et al. “PItcHPERFeCT: Primary intracranial hemorrhage probability estimation using random forests on CT.” NeuroImage: Clinical 14 (2017): 379-390.

  • extrantsr
  • dcm2niir
  • matlabr
  • spm12r
  • freesurfer
  • itksnapr
  • stapler
  • gifti
  • cifti
  • papayar
  • diffr
  • gcite
  • rscopus
  • fedreporter
  • glassdoor

Number of Downloads (from cranlogs)

Lesion Segmentation of MS

Public Dataset with Lesion Segmentation

Demographic Data

  • On many different therapies (9 no therapy), age IQR: 33 - 42, EDSS IQR: 1.5 - 4
Variable Overall
n 30
Age (mean (sd)) 39.27 (10.12)
sex = M (%) 7 (23.3)
EDSS (mean (sd)) 2.61 (1.88)
Lesion_Volume (mean (sd)) 17.40 (16.13)
MS_Subtype (%)
Clinically Isolated Syndrome 2 (6.7)
Progressive-relapsing 1 (3.3)
Relapsing-remitting 24 (80.0)
Secondary-progressive 2 (6.7)
Unspecified 1 (3.3)

Imaging Data

  • 2D T1 (TR=2000ms, TE=20ms, TI=800ms) and after gadolinium
  • 2D T2 (TR=6000ms, TE=120ms), 3D FLAIR (TR=5000ms, TE=392ms, TI=1800 ms)
    • Fluid attenuated inversion recovery - reduce signal of fluids
  • All had flip angle of 120\(^{\circ}\)

OVERLAY

Terminology: Neuroimaging to Data/Statistics

  • Segmentation ⇔ classification
  • Image ⇔ 3-dimensional array
  • Mask/Region of Interest ⇔ binary (0/1) image
  • Registration ⇔ Spatial Normalization/Standarization
    • “Lining up” Brains

Image Representation: voxels (3D pixels)

Step 1: Image Processing: Workflow

The N4 (Tustison et al. 2010) EM-style model assumed is: \[ \log(x(v)) = \log(u(v)) + \log( f(v) ) \]

  • \(x\): given image
  • \(u\): uncorrupted image
  • \(f\): bias field
  • \(v\): location in the image

Step 1: Image Processing: MALF

Figure from Multi-Atlas Skull Stripping method paper (Doshi et al. 2013):

  • Register templates to an image using the T1 for that subject
  • Apply transformation to the label/mask
  • Average each voxel over all templates
    • there are “smarter” (e.g. weighted) ways

Step 2: Create Predictors for each Sequence

Preds

  • Predictors created with intensity-normalized data
    • Quantile images, smoothers, local moments
  • Tissue class probability with local moments: MALF and FAST (Zhang, Brady, and Smith 2001)
  • Z-score to a population template

A package to do all this: smri.process

  • GitHub package (muschellij2/smri.process)

code

Data Structure for One Patient
Vox stack

Step 3: Aggregate Data

Training Data Structure

  • Sample 10% of the voxels (save computation time)
  • Stack together 14 randomly selected patients, stratified by age (over median) and volume
  • Train model/classifier on this design matrix
  • Smooth the probability map
  • Test on 16 hold out
MISTIE LOGO

Step 4: Fit Models / Classifier

Let \(y_{i}(v)\) be the presence / absence of lesion for voxel \(v\) from person \(i\).

General model form: \[ P(Y_{i}(v) = 1) \propto f(X_{i}(v)) \]
- Previous work - OASIS (Sweeney et al. 2013):

\[ f(X_{i}(v)) = \text{expit} \left\{ \beta_0 + \sum_{k} x_{k}(v)\beta_{k} + x_{k}(v) \times x_{10, k} \beta_{10,k} + x_{k}(v) \times x_{20, k} \beta_{20,k}\right\} \]

\(k \in \{T1, T2, FLAIR, PD\}\).

  • With the original model w/o T1Post and a re-trained model

Models Fit on the Training Data

  • \(85\) predictors were generated and put into
  • Random Forests (Wright and Ziegler 2017), (Breiman 2001)
    • With 5 fold cross-validation, default 500 trees, mtry: \(\sqrt{p}\)
    • With and without the T1-Post for comparison to OASIS
      \(f(X_{i}(v)) \propto\) RF

For each model (RF with and w/o T1Post and OASIS retrained or not) - Estimate a probability cutoff on training data - Predict on test data, assess performance acrosss all voxels in the brain

Assessing Performance

For each test scan, and over all test scans, we can calculate the following 2-by-2 table, where cells represent number of voxels and corresponding Venn diagram:

Manual
0 1
Auto 0 TN FN
1 FP TP


Dice Coeffiicent (Dice 1945): \[ \text{Dice} = \frac{2\times\text{TP}}{2\times\text{TP} + \text{FN} + {FP}} \]

Dice Results (Triangle is population Dice) Reseg

varimp

  • Top predictors in RF model
  • T1Post not in there
  • Tissue segmentations are a large predictor

RF Predicted Volume Estimates True Volume Reseg

OASIS: not so much Reseg

Patient with Median DSI (0.63) in Test

Median

Median

Patient with High DSI (0.73) in Test

Median

Median

Brain Stem Lesions Estimated

Median

Median

Conclusions of Lesion Analyses

  • We can segment MS lesions reasonably well

  • Better models with larger samples

  • Needs to be more stable/accurate for a biomarker
    • Location may also be relevant and not taken into account
    • Is the brain stem an area we should focus on or remove from assessment?

Next Steps/Questions

  • Run new processing the 131 patients from OASIS paper
  • Gray matter injury estimation
  • Is EDSS the clinical score we should be relating this to?
  • “Black hole” lesions using the T1-post image, these may show “active” lesions

Thank You

Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1). Springer:5–32.

Dice, Lee R. 1945. “Measures of the Amount of Ecologic Association Between Species.” Ecology 26 (3):297–302. http://www.jstor.org/stable/1932409.

Doshi, Jimit, Guray Erus, Yangming Ou, Bilwaj Gaonkar, and Christos Davatzikos. 2013. “Multi-Atlas Skull-Stripping.” Academic Radiology 20 (12). Elsevier:1566–76.

Lesjak, Žiga, Alfiia Galimzianova, Aleš Koren, Matej Lukin, Franjo Pernuš, Boštjan Likar, and Žiga Špiclin. 2018. “A Novel Public MR Image Dataset of Multiple Sclerosis Patients with Lesion Segmentations Based on Multi-Rater Consensus.” Neuroinformatics 16 (1). Springer:51–63.

Muschelli, John, Adrian Gherman, Jean-Philippe Fortin, Brian Avants, Brandon Whitcher, Jonathan D Clayden, Brian S Caffo, and Ciprian M Crainiceanu. 2018. “Neuroconductor: An R Platform for Medical Imaging Analysis.” Biostatistics.

Sweeney, Elizabeth M, Russell T Shinohara, Navid Shiee, Farrah J Mateen, Avni A Chudgar, Jennifer L Cuzzocreo, Peter A Calabresi, Dzung L Pham, Daniel S Reich, and Ciprian M Crainiceanu. 2013. “OASIS Is Automated Statistical Inference for Segmentation, with Applications to Multiple Sclerosis Lesion Segmentation in MRI.” NeuroImage: Clinical 2. Elsevier:402–13.

Tustison, Nicholas J., Brian B. Avants, Philip A. Cook, Yuanjie Zheng, Alexander Egan, Paul A. Yushkevich, and James C. Gee. 2010. “N4ITK: Improved N3 Bias Correction.” IEEE Transactions on Medical Imaging 29 (6):1310–20. https://doi.org/10.1109/TMI.2010.2046908.

Wright, Marvin N., and Andreas Ziegler. 2017. “ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software 77 (1):1–17. https://doi.org/10.18637/jss.v077.i01.

Zhang, Yongyue, Michael Brady, and Stephen Smith. 2001. “Segmentation of Brain MR Images Through a Hidden Markov Random Field Model and the Expectation-Maximization Algorithm.” Medical Imaging, IEEE Transactions on 20 (1):45–57. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=906424.